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A search is presented for the standard model Higgs boson produced in association with top 
quarks using the full Run II proton-antiproton collision data set, corresponding to 9.45 fb _ , 
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The mechanism of electroweak symmetry breaking [l| 
in the standard model (SM) Q predicts the existence 
of a massive particle called the Higgs boson. The CDF 
and DO collaborations have reported evidence for a par- 
ticle consistent the SM Higgs boson with a mass between 
120 and 135 GeV/c 2 produced in association with a W 
or Z boson with decays to two b quarks The CMS 
and ATLAS collaborations have reported the observa- 
tion of a particle consistent with the SM Higgs boson 
with a mass of approximately 125 GeV/c 2 . which de- 
cays to two photons, two W bosons, or two Z bosons 
Many other predicted couplings of the SM Higgs boson 
are currently neither observed nor excluded. In the SM, 
the fermion masses are generated by Yukawa couplings 
between the Higgs and the fermion fields with coupling 
strength proportional to the fermion masses. As the most 
massive known fermion, the top quark is expected to cou- 
ple most strongly to the Higgs boson, which consequently 
may be produced relatively more abundantly in associ- 



79609, USA, cc Universidad Tecnica Federico Santa Maria, HOv 
Valparaiso, Chile, dd Yarmouk University, Irbid 211-63, Jordan. 



4 



ation with a top quark pair, via radiation or top-quark 
fusion [j| Q . Samples of top-quark pair events with a few 
percent-level contamination from other processes can be 
selected at CDF Q, offering smaller background uncer- 
tainties than in searches for the SM Higgs boson pro- 
duced in association with a vector boson [8j. Hence, the 
top-quark pair associated production channel provides an 
important contribution to SM Higgs boson physics. Fur- 
thermore, proposed extensions to the SM could signifi- 
cantly enhance the coupling between the top quark and 
the Higgs boson Q. This enhancement might allow the 
observation of a non-SM Higgs boson in this search before 
reaching sensitivity to a SM Higgs boson, and could help 
to distinguish a candidate Higgs boson in other searches 
from the SM Higgs boson. 

This Letter reports a search for the SM Higgs boson 
produced in association with top quarks (ttH). We uti- 
lize the full data set recorded with the CDF II detector. 
The data set consists of proton-antiproton collisions at 
a center-of-mass energy of yfs = 1.96 TeV, and corre- 
sponds to an integrated luminosity of 9.45 fb _1 . The 
analysis described in this Letter extends and enhances a 
previous CDF search which used 319 pb -1 [13], through 
a vastly increased data set, greater signal acceptance and 
improved background discrimination. 

The CDF II detector is a general-purpose particle de- 
tector described in Ref. llj. It consists of a combined 



silicon and drift chamber tracking system with a large 
volume immersed in the 1.4 T field of a solenoid mag- 
l2l 13 L lead- and iron-scintillator sampling calorime- 



net 
ters 



and charged particle detectors outside the 
calorimeter, which are used to identify muons (l6| . A 
right-handed cylindrical coordinate system is used with 
the origin in the center of the detector, with 9 and (f> 
denoting the polar and azimuthal angles, respectively. 
Pseudorapidity is defined as r\ = — lntan(#/2), and 
transverse energy and momentum are Et = E sin 9 and 
Pt = V sin 9, where E and p are the energy and momen- 
tum, respectively. 

The decay of a pair of top quarks is expected to gener- 
ate almost exclusively two W bosons and two b quarks. 
The W bosons may then decay to lepton-neutrino pairs, 
or pairs of quarks. We select events consistent with one 
leptonic and one hadronic W boson decay by requiring 
the presence of a single reconstructed lepton (electron or 
muon), missing transverse energy (Br) [13] i and four or 
more calorimeter energy clusters (jets). At least two of 
the jets in each event are required to be consistent with 
the fragmentation of a 6 quark (6-tagged). Because a 
low-mass (run < 135 GeV/c 2 ) SM Higgs boson is ex- 
pected to decay mostly to pairs of b quarks, or pairs 
of W bosons, that will decay predominantly to pairs of 
u, d, s, or c quarks, large 6-tag and jet multiplicities are 
requested by the selection. Approximately 90% of the se- 
lected search sample is composed of top-quark pairs, with 
the remainder consisting of W or Z bosons accompanied 



by jets (W/Z + jets), single top-quarks, dibosons, and 
strong force mediated (QCD) multijets. Table Q] shows 
the expected composition of the data sample. 

To select events during data taking we require the pres- 
ence of a charged lepton (electron e or muon ^) candi- 
date with transverse momentum pt > 18 GeV/c. We 
further require that the lepton candidate satisfies identi- 
fication quality requirements as in Ref. Q- We require 
that E!t be greater than 10 GeV, 20 GeV, or 25 GcV 
in events containing a muon candidate, an electron can- 
didate satisfying |?7| < 1.1, and an electron candidate 
satisfying > 1.1, respectively. These Mr requirements 
are chosen to optimize the signal selection efficiency and 
the rejection of instrumental backgrounds, which dif- 
fer in the three samples. Jets are reconstructed using 
a cone-based clustering algorithm, with a cone radius 
(R = \J Acj) 2 + Ar] 2 ) of 0.4 11811 . Jet energies are cor- 
rected for instrumental effects [ljj, and the corrected jets 
are required to have Et > 20 GeV and \ij\ < 2.0. We use 
two different algorithms to tag b jets as in Ref. [i(| • One 
algorithm relies on the reconstruction of secondary d ecay 



vertices from long-lived hadrons within the jet cone [21 



while the other estimates the likelihood that not all tracks 
in the jet cone intersect the beam line [22[ . Jets identi- 
fied by either algorithm arc considered as tagged, offering 
higher tagging efficiency than obtained by the use of one 
algorithm alone. 

We model the various backgrounds using a combina- 
tion of Monte Carlo (MC) simulation and data. We 
simulate the ti, diboson, W/Z + jets, and sing le-top 
back grou nds using the POWHEG [23j . pythia [24] . ALP- 
GEN [25l ] and MadEvent [13] generators, respectively. 
We model the QCD multijet background using a data- 
driven model 0]- For backgrounds involving top quarks, 
we have used nit = 172.5 GeV/c 2 . Signal models are gen- 
erated by pythia, with Higgs boson masses in 5 GeV/c 2 
increments in the range 100 < nin < 150 GeV/c 2 . The 
CTEQ5L parton distribution functions [27| and a de- 
tailed simulation of the response of the CDF II detec- 
tor using GEANT3 [28| is employed in all Monte Carlo 
samples. 

The search sample is subdivided into independent cat- 
egories of different expected signal-to-background ratio 
and background composition to maximize the search sen- 
sitivity [29j. Under the selection requirements described 
above, the reconstructed jet multiplicity spectrum in ttH 
events peaks at five jets, while the reconstructed jet mul- 
tiplicity spectrum for tt peaks at four jets. Hence, we 
separate events with four, five, or six or more jets. The 
jet multiplicity samples are then separated by 6-tag mul- 
tiplicity. The events with six or more jets, at least three 
of which are 6-tagged, feature the largest expected signal- 
to-background ratio and provide the most sensitivity for 
a low-mass Higgs boson. 

After defining our search sample, we enhance the isola- 
tion of a SM Higgs signal using artificial neural networks 
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TABLE I: Expected number of events from the various processes composing our data sample, requiring two or more b tags, 
with background rates and uncertainties taken from the posterior likelihoods. Uncertainties shown are correlated. Signal yields 
are quoted assuming mu = 125 GeV/c 2 . 



Process 




4 jets 


5 jets 


> 6 jets 


ti ± jets 


962 


± 89 


294 


±27 


77 ± 7.1 


tt ± 00 


32 


± 27 


17 


± 14 


8.2 ± 6.9 


W/Z + jets 


105 


± 32 


ZD 


4- s n 
m o.u 


7.1 ± 2.2 


Multijet 


31 


± 16 


0.0 


± 1.0 


0.0 ± 1.0 


Single top 


19 


± 2.2 


3.7 


± 0.43 


0.61 ± 0.070 


Diboson 


5.2 ± 0.44 


1.2 


± 0.11 


0.25 ± 0.025 


Total background 


1150 


± 106 


340 


±33 


93 ± 11 


Observed 


1133 




368 




114 


itH 


0.65 ± 0.075 


1.1 


± 0.13 


1.2 ± 0.14 


WH 


0.52 ± 0.061 


0.07 


± 0.008 


negligible 


ZH 


0.09 ± 0.011 


0.02 


± 0.002 


negligible 




FIG. 1: Invariant mass of the two jets without b tags, in events 
containing exactly four jets and exactly two b tags. The peak 
of the distribution is consistent with hadronic decays of the W 
boson. The effect of systematic uncertainties is not shown. In 
the signal model shown, a Higgs boson of mu = 125 GeV/c 2 
is assumed. 



FIG. 2: The mass of the vector sum of the four-momenta 
of the identified charged lepton, the neutrino, and all recon- 
structed jets in events with exactly five jets and at least two b 
tags. The effect of systematic uncertainties is not shown. In 
the signal model shown, a Higgs boson of mu = 125 GeV/c 2 
is assumed. 



(NN) j3Cj. Each neural network is trained to separate 
simulated Higgs signal events from background, with in- 
dividual networks optimized for each Higgs boson mass 
hypothesis in each of the previously-described event cate- 
gories. Each network uses 18 input variables used to dis- 
criminate the Higgs boson signal from the backgrounds. 
These variables are: missing transverse energy, maximum 
jet Et, second largest jet Et, third largest jet Et, max- 
imum Et among 6-taggcd jets, mean jet Et, invariant 



mass of the combination of all objects (jets, lepton,^), 
vector sum of the transverse energies of all objects, scalar 
sum of the transverse energies of all objects, scalar sum 
of the transverse energies of all jets, number of energy 
clusters with Et between 12 and 20 GeV, minimum sep- 
aration in r]-(f> space between 6-tagged jets, separation in 
azimuth between the lepton and the missing transverse 
energy, transverse mass of the lepton and missing trans- 
verse energy |3lj . mass of the vector sum of the lepton 
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FIG. 3: The output distribution for the discriminant opti- 
mized for the m,H = 125 GeV/c 2 hypothesis, for events with 
six or more jets and three or more b tags. The effect of system- 
atic uncertainties is not shown. In the signal model shown, a 
Higgs boson of mn = 125 GeV/c 2 is assumed. 



and nearest jet in rj-<j) space, minimum mass of the vector 
sum of any pair of jets, mass of the vector sum of the two 
non 6- tagged jets with the largest Et, and mass of the 
vector sum of the two 6-tagged jets with the largest Et- 
The modeling of the input distributions has been vali- 
dated in the subset of the data with only four jets and 
only two b tags, which is expected to contain a negligible 
number of signal events relative to the background yield. 
Two of these distributions can be seen in Figs. Q] and [2] 
and the output of the discriminant trained to identify a 
Higgs boson of mass 125 GeV/c 2 is shown in Fig. |3] 

We consider several sources of systematic uncertainty 
that affect the rate of the involved processes and the 
shape of the discriminant distributions. Due to the high 
jet and &-tag multiplicities considered, the dominant sys- 
tematic uncertainties are associated with estimates of the 
6-tag efficiency and the jet energy scale. These affect 
both the rates and the discriminant shapes, and we esti- 
mate the effects by independently varying the estimated 
&-tag efficiency and the jet-energy scale within one stan- 
dard deviation. These variations in jet-energy scale and 
tagging efficiency alter the expected acceptance for signal 
and background by between 1 and 20%, depending on the 
selection category. In addition, to account for uncertain- 
ties on the theoretical cross sections of background pro- 
cesses, we assume the following systematic uncertainties 
on the normalization of simulated backgrounds: 6% for 
diboson production, 6% for single top quark production, 
10% for ttH production, and 40% for W/Z + jets (32- 
35| Smaller uncertainties include those on the amount 
of initial- and final-state radiation, parton-distribution 




Median expected 
Observed 

Median expected ± 1 a 
Median expected ± 2 a 



Standard model 



100 105 110 115 120 125 130 135 140 145 150 

Higgs boson mass (GeV/c 2 ) 

FIG. 4: Expected and observed 95% C.L. upper limit as a 
function of Higgs boson mass for 100 < ran < 150 GeV/c 2 . 



function choice, the probability to 6-tag light-quark jets, 
and a 6% uncertainty on the measurement of the inte- 
grated luminosity [29|, [3(| . 

No measurement is available of the cross section for 
top-quark production with additional b quarks generated 
from QCD radiation. The next-to-leading-order correc- 
tions to leading-order calculations of the production rate 
of top-quark pairs with additional b quarks have been 
estimated to be on the order of a factor of two in some 
regions of phase space (3?} . To account for this unknown 
and potentially large systematic uncertainty, inclusive tt 
simulated events were separated into subsamples with ad- 
ditional b quarks generated from QCD radiation {tt + bb), 
and without (tt+jets). We assume an uncertainty of 10% 
on the normalization of the tt + jets component and as- 
sume an uncertainty of 100% on the normalization of the 
tt + bb component. We estimate the effect of individual 
systematic uncertainties by calculating the expected ex- 
clusion sensitivity considering all uncertainties, and then 
comparing this value to that derived by considering all 
but one uncertainty. The uncertainty due to the jet- 
energy scale, 6-tag efficiency, inclusive top pair cross sec- 
tion, and potential next-to-leading-order effects for tt+bb 
individually degrade the expected exclusion sensitivity of 
the analysis by 7.8%, 5.4%, 6.9%, and 9.0%, respectively. 

We compare the distribution of discriminant output 
observed in data to that of the expected background 
model. Observing no evidence for Higgs boson pro- 
duction in the discriminant distributions, we calculate 
a Bayesian 95% credibility level (C.L.) limit for each 
mass hypothesis using the combined binned likelihood 
of the NN output distributions. Each of the three jet- 
multiplicity categories are subdivided into five indepen- 
dent tagging categories. A posterior density is obtained 
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by multiplying this likelihood by Gaussian prior densities 
for the background normalizations and systematic uncer- 
tainties, leaving the cross section a(tiH — > £+$t +jets) 
with a uniform prior density, with priors truncated to 
prevent negative predictions. A 95% C.L. limit is deter- 
mined such that 95% of the posterior density for the cross 
section accumulates below the limit [3a |, The expected 
limits with one and two standard deviation uncertainty 
bands and the observed limits are shown as a function of 
assumed Higgs boson mass in Fig. 0] Because none of the 
discriminant function input variables acts as an estimator 
for the reconstructed Higgs boson mass, the upper cred- 
ibility limits at different candidate Higgs boson masses 
are strongly correlated. An excess in the data produces 
an observed limit that exceeds the expected limit at all 
masses, at a level of approximately one standard devia- 
tion compared to the background-only hypotheses. 

In conclusion, we have presented a search for a SM 
Higgs boson produced in association with a pair of top 
quarks, in a final state involving a lepton, missing trans- 
verse energy, jets, and 6-tagged jets. For a Higgs boson 
mass of 125 GeV/c 2 , we expect a limit of 12.6 and ob- 
serve a limit of 20.5 times the SM rate, which represents 
agreement with the background-only prediction at the 
level of approximately one standard deviation. The in- 
troduction of neural networks and other improvements to 
the techniques employed in this analysis produce a factor 
of 17 improvement in sensitivity over the previous search 
in this channel at CDF [l(J and make this analysis the 
most sensitive search for tiH to date. 
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